OpenCL: add initial FA support #14987

rmatif · 2025-07-31T12:31:52Z

This PR introduces F16/F32 FA support for the OpenCL backend. It has been extremely challenging to achieve good performance on this kind of hardware, but I believe it is now decent enough to serve as a baseline that we can further iterate on. I also believe there is room for improvement for tg

Results on Adreno 830:

model	size	params	backend	ngl	fa	test	t/s
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	0	pp512	198.69 ± 0.59
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	0	tg128	21.88 ± 0.85
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	1	pp512	274.75 ± 1.22
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	1	tg128	21.58 ± 0.39

Adreno 750:

model	size	params	backend	ngl	fa	test	t/s
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	0	pp512	139.96 ± 0.51
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	0	tg128	19.70 ± 0.11
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	1	pp512	151.22 ± 0.85
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	1	tg128	17.94 ± 0.15

ggml/src/ggml-opencl/kernels/flash_attn_f16.cl

lhez · 2025-08-01T00:14:55Z

@rmatif Very cool, thank you!

lhez · 2025-08-10T13:35:50Z

Sorry, got distracted during the past week. Will come back to this asap.

ggml/src/ggml-opencl/kernels/flash_attn_f32.cl

lhez · 2025-08-15T03:00:14Z

It seems to help small models like qwen2.5-0.5b,

qwen2.5-0.5b-Q4_0

A750

ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.16

model	size	params	backend	ngl	fa	test	t/s
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	1	pp1024	634.68 ± 2.06
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	1	tg128	35.91 ± 4.13
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	0	pp1024	283.68 ± 0.56
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	0	tg128	34.57 ± 2.91

A830

ggml_opencl: OpenCL driver: OpenCL 3.0 QUALCOMM build: 0800.35 Compiler E031.47.18.28

model	size	params	backend	ngl	fa	test	t/s
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	1	pp1024	1079.08 ± 1.04
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	1	tg128	97.31 ± 0.49
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	0	pp1024	388.14 ± 15.41
qwen2 1B Q4_0	265.25 MiB	494.03 M	OpenCL	99	0	tg128	95.84 ± 0.94

ggml/src/ggml-opencl/ggml-opencl.cpp

lhez · 2025-08-16T08:05:33Z

Current implementation works well for small models (e.g., qwen2.5-0.5B), significantly improving pp performance. For larger models, larger configs (e.g., {128, 128, 32, 32} for qwen2.5-1.5B) are used; these configs seem to result in spilling into global memory.

We will use this implementation as the baseline and do further investigations and improvements.

add F16/F16 fa support

e585cba

rmatif requested review from max-krasnyansky and lhez July 31, 2025 12:35

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jul 31, 2025

fix kernel init

d3f049b

lhez reviewed Aug 1, 2025

View reviewed changes

ggml/src/ggml-opencl/kernels/flash_attn_f16.cl Outdated Show resolved Hide resolved

use mad instead of fma

65910d3

Green-Sky mentioned this pull request Aug 1, 2025

There are two problems, 1: VAE crash, 2:Performance Tuning leejet/stable-diffusion.cpp#747

Open

lhez reviewed Aug 14, 2025

View reviewed changes

ggml/src/ggml-opencl/kernels/flash_attn_f32.cl Outdated Show resolved Hide resolved

use inline function

1b06404

lhez reviewed Aug 15, 2025

View reviewed changes

ggml/src/ggml-opencl/ggml-opencl.cpp Show resolved Hide resolved

rmatif added 2 commits August 15, 2025 08:37

mark FA with sinks as unsupported for now

db7c564

add pragma unroll to loops

8c4025f

lhez approved these changes Aug 16, 2025

View reviewed changes

lhez merged commit 912ff8c into ggml-org:master Aug 16, 2025
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenCL: add initial FA support #14987

OpenCL: add initial FA support #14987

Uh oh!

rmatif commented Jul 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

lhez commented Aug 1, 2025

Uh oh!

lhez commented Aug 10, 2025

Uh oh!

Uh oh!

lhez commented Aug 15, 2025

Uh oh!

Uh oh!

lhez commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

OpenCL: add initial FA support #14987

OpenCL: add initial FA support #14987

Uh oh!

Conversation

rmatif commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lhez commented Aug 1, 2025

Uh oh!

lhez commented Aug 10, 2025

Uh oh!

Uh oh!

lhez commented Aug 15, 2025

qwen2.5-0.5b-Q4_0

A750

A830

Uh oh!

Uh oh!

lhez commented Aug 16, 2025

Uh oh!

Uh oh!

Uh oh!

rmatif commented Jul 31, 2025 •

edited

Loading